policy class
Functional Natural Policy Gradients
Bibaut, Aurelien, Zenati, Houssam, Rahier, Thibaud, Kallus, Nathan
Personalized decision policies are increasingly central in areas such as healthcare [Bertsimas et al., 2017], education[Mandeletal.,2014], andpublicpolicy[Kubeetal.,2019], wheretailoringactions to individual characteristics can improve outcomes. In many of these settings, however, actively experimenting with new policies to generate "online data" is expensive, risky, or infeasible, which motivates methods that can evaluate and optimize policies using pre-existing "offline data." A variety of work studies semiparametric efficient estimation of the value of a fixed policy from offline data [Chernozhukov et al., 2018, Dud ık et al., 2011, Jiang and Li, 2016, Kallus and Uehara, 2020, 2022, Kallus et al., 2022, Scharfstein et al., 1999]. And, a variety of work considers selecting the policy that optimizes such estimates over policies in a given class [Athey and Wager, 2021, Chernozhukov et al., 2019, Foster and Syrgkanis, 2023, Kallus, 2021, Zhang et al., 2013, Zhou et al., 2023], which generally yields rates the scale with policy class complexity, e.g., OP(N 1/2) for VC classes. Luedtke and Chambaz [2020] get regret acceleration to oP(N 1/2) by leveraging an equicontinuity argument.
- North America > United States > California (0.14)
- North America > United States > Oregon (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Research Report > Experimental Study (0.93)
- Research Report > Strength High (0.68)
- Law (1.00)
- Health & Medicine > Consumer Health (0.67)
- Health & Medicine > Government Relations & Public Policy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
- Education (0.93)
- Information Technology (0.92)
- Transportation (0.67)
- North America > United States > Illinois > Champaign County > Urbana (0.14)
- North America > United States > California > Los Angeles County > Pasadena (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Leisure & Entertainment > Games (1.00)
- Education (0.93)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.50)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Russia (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)